written by Heidi Baumgartner and Rachel Martino at the Developmental Cognitive Neuroscience Lab at Brown University, directed by Dr. Dima Amso
Caption

Overview

Some eye-tracking (ET) systems collect independent data streams from each eye (left and right), but analysis protocols often require the user to choose one eye or the other for analysis purposes. In optimal circumstances, it makes little difference which eye is used because these data streams are nearly identical. Sometimes, however, the data stream from one eye is significantly more stable or accurate than the stream from the other eye (e.g., this situation occurs often when tracking very young participants with a between-eye distance that is at the lower limit of what the eye tracker can handle).

This toolbox is designed to quantify the TRACKING RATIO (percent of ET samples with non-zero gaze coordinates) and calibration ACCURACY (deviation of estimated point-of-regard from defined coordinates) by eye (R/L) for eye tracking data and to RECOMMEND which eye to use for subsequent analyses based on these metrics. The toolbox also generates a PRECISION metric (modeled after Wass et al., 2014), which indicates the amount of sample-to-sample jitter or noise that is present in the data for each eye. This metric is currently used for visualization purposes only, and is not included in the recommendation decision.

The toolbox is designed to work with data generated by BeGaze (SensoMotoric Instruments), but should work as long as the variables/format specified in Using a non-SMI system are present.


Instructions

Refer to the ET Quality Toolbox USER GUIDE for instructions on using the toolbox to generate quality metrics and eye recommendations.


Set parameters

These are the experiment-specific parameters that must be defined by the user.
Once these parameters have been set, you should not have to edit code in other sections beyond tinkering with plot sizing.

  1. Update datadir to reflect correct path to data and outputname with desired name of summary output file
    • IMPORTANT: Make sure that the only .txt file(s) in the directory specified by datadir is raw gaze data in the format specified above, because the toolbox will attempt to load/append all .txt files in datadir. Any files not in this format will cause the toolbox to break.
## path to data directory
datadir <- "/Users/heidibaumgartner/Documents/GitHub/EyeTrackingQuality_Toolbox/ETQuality_toolbox/ExampleExpt_Data"

## file name for summary metrics
outputname <- "ExampleExp_ETQuality_output.csv"


  1. Define ET system parameter(s). This value is used to determine the duration (in ms) of each sample.
## sampling rate of eye tracker (Hz)
EyeTrackHz <- 60


  1. Set thresholds for ‘meaningful’ differences (i.e., discrepancies that are big enough to care about and therefore should be used for eye choice determination) for the following metrics:
    • Tracking ratio (Threshold_TrackingDiff): default difference threshold is 5% (i.e., only use tracking ratio for eye choice recommendation if TR for one eye is >5% better than TR for other eye)
    • Validation accuracy (Threshold_AccuracyDiff): default difference threshold is 25 pixels (i.e., only use validation deviation values for eye choice recommendation if deviation values for one eye are >25 pixels closer to validation-stimulus center than deviation values for other eye)
    • Precision (Threshold_PrecisionDiff:
## threshold for meaningful tracking ratio difference (%)
Threshold_TrackingDiff <- 5 # %

## threshold for meaningful deviation difference (pixels)
Threshold_AccuracyDiff <- 25  # pixels 

## threshold for meaningful precision difference 
Threshold_PrecisionDiff <- 1   


  1. Define stimuli to be used for validation purposes (ValidationStim), AOIs (ValidataionAOI), and stimulus coordinates (ValidationX/Y)
    • Make sure that these four lists all have the same number of items
    • If using the DCNL custom validation block, use default values
    • Stimulus names in ValidationStim must match text in Stimulus column of data file(s) EXACTLY (case-sensitive)
    • AOI names in ValidationAOI must match text in AOI Name Left/Right columns of data files EXACTLY (case-sensitive)
## define names of stimuli used to measure validation accuracy (case-sensitive)
ValidationStim <- c('validation1.jpg',
                    'validation2.jpg',
                    'validation3.jpg',
                    'validation4.jpg')

## AOIs for corresponding items in 'ValidationStim' (case-sensitive)
ValidationAOI <- c('validation1',
                   'validation2',
                   'validation3',
                   'validation4')  

## X coordinates for center of corresponding items in 'ValidationStim'
ValidationX <- c(480, 
                 1440, 
                 480, 
                 1440) 

## Y coordinates for center of corresponding items in 'ValidationStim'
ValidationY <- c(270, 
                 270, 
                 810, 
                 810) 


  1. Define experimental stimuli (ExperimentalStim) that you want to INCLUDE in tracking ratio and distance calculations
    • To define non-experimental stimuli to EXCLUDE from these calculations instead, COMMENT OUT ExperimentalStim definition and use/edit FillerStim instead.
    • If you want to include all stimuli in tracking ratio/distance calculations, comment out ExperimentalStim and use FillerStim <- c(‘nothing’)
## define stimulus names to be INCLUDED in tracking ratio/distance calculations 
ExperimentalStim <- c('Pic1.jpg', 'Pic2.jpg', 'Pic3.jpg', 'Movie1.avi', 'Movie2.avi')

## define filler stimulus names to be EXCLUDED from tracking ratio/distance calculations
# FillerStim <- c('Validation.jpg', 'validation1.jpg', 'validation2.jpg', 'validation3.jpg', 'validation4.jpg', 'Fixation.png')

## to include ALL stimuli in tracking ratio/distance calculations, use this instead
# FillerStim <- c('nothing')


  1. Choose if you want to save plots to individual files (saveplotstofile) and edit plot aesthetics (e.g., color, marker shape) (optional)
## save plots as individual files?
saveplotstofile <- 1 # 0=no, 1=yes 

## Aesthetics for plots (can change as desired)
marker_colors <- c('#007c92', '#e98300', '#8c1515','#8c1515') 
marker_fills <- c('#007c92', NA, NA, NA) # NA for no fill
marker_shapes <- c(21,21,4,4) # 21=circle, 22=square, 23=diamond, 24=triangle, 25=invertedtriangle, 4=X
marker_size <- 2    # size of markers on point plots
marker_stroke <- 2  # width of marker stroke on point plots
legend_size <- 3    # size of markers in plot legends
title_color <-  '#8c1515' 
title_size <- 16    # font size of plot titles
yaxis_size <- 12     # font size of y-axis titles
line_colors <- c('#68a2c1','#0a8002','808080')

## grayscale color options
# marker_colors <- c('gray', 'black', 'black', 'black')
# marker_fills <- c('gray', NA, NA, NA)
# title_color <- 'black'


  1. Choose if you want to run the supplementary Precision section (This section takes a very long time to run with large amounts of data)
## to include Precision chunks, prec_on = TRUE
## to exclude Precision chunks, prec_on = FALSE
prec_on = TRUE 




Load and organize data

[You shouldn’t need to change anything here unless you are using data generated outside of BeGaze and need to adjust for different variable names in data file(s).]




Tracking ratio

Tracking Ratio is calculated by dividing the number of samples with non-0 gaze data (TrackedSamples) by total number of samples for the subset of stimuli defined above (either for stimuli defined in ExperimentalStim list or for all stimuli other than those defined in FillerStim list), then multiplied by 100 to convert to percentage.

\[ TR = (\frac{TrackedSamples}{TotalSamples}) * 100 \]



Table 1: Tracking ratio values and eye comparisons

  • BetterTracking: Eye with better tracking ratio (difference between eyes not necessarily above threshold!)
  • TrackingDiff: Absolute difference of L and R tracking ratios
##    Subject BetterTracking TrackingDiff TrackingRatio_L TrackingRatio_R
## 1:     P01             NA            0             100             100
## 2:     P02              L           45              99              54
## 3:     P03              R           25              63              88
## 4:     P04              L            2              52              50
## 5:     P05             NA            0              90              90
##    TotalSamples NSamples_L NSamples_R
## 1:         2322       2322       2313
## 2:         2500       2480       1338
## 3:         2653       1659       2341
## 4:         5604       2912       2828
## 5:         3156       2846       2842



Figure: Tracking ratio (overall)

This plot provides a quick visualization of tracking ratio by eye for each subject.



Figure: Tracking ratio, by stimulus (all trials)

This plot provides a visualization of tracking ratio by stimulus. This allows you to quickly see if tracking ratio differs or is consistent across stimuli. Each circle represents a single presentation of the stimulus. Data points are jittered so that if stimuli are presented more than once, you will be able to see the tracking ratio during each presentation.

NOTE: If stimuli do not repeat (1 trial per stimulus), this figure will be redundant with Figure 1c.



Figure: Tracking ratio, by stimulus (stimulus averages)

This plot provides a visualization of tracking ratio by stimulus. If stimuli are presented more than once, the data point represents the tracking ratio (by eye) averaged across all presentations of that stimulus.

NOTE: If stimuli do not repeat (1 trial per stimulus), this figure will be redundant with Figure 1b.



Figure: Tracking ratio, by trial

This plot provides a visualization of tracking ratio by trial. This allows you to quickly see if a subject’s tracking ratio is consistent or changes over time.




Calibration accuracy

Deviation statistics are calculated using the longest fixation to the validation stimulus on each validation trial (based on the assumption that the longest fixation that falls within the validation AOI on each trial most likely reflects looking to the validation stimulus). The average euclidean distance (deviation) from each point-of-regard (POR) within the longest fixation to the validation stimulus is calculated for each trial. Deviation values are then averaged over all validation trials to generate an average deviation value.

\[\begin{aligned} Deviation_{sample} &= \sqrt{(STIM_{x}-POR_{x})^2+(STIM_{y}-POR_{y})^2} \\ \\ Deviation_{fixation} &= mean(Deviation_{samples}) \\ \\ Deviation_{average} &= mean(Deviation_{LongestFixations}) \end{aligned}\]


Table: Deviation values

  • BetterAccuracy: Eye with better (i.e., smaller) deviation value (not necessarily above threshold!)
  • AccuracyDiff: Absolute difference of L and R deviation values
  • stdev_L/R: Average standard deviation of deviation values within each fixation (higher values indicate less stable fixations)
  • Duration_L/R: Average fixation duration of fixations used to calculate deviations
##    Subject BetterAccuracy AccuracyDiff Deviation_L   stdev_L Duration_L
## 1:     P01              L    20.944346    25.02673 12.873421   1716.050
## 2:     P02              L     6.874652    43.72860  5.828800   1620.275
## 3:     P03              L     7.902798    71.87286 12.531135   1591.075
## 4:     P04              L    12.543479    23.42790  5.403864   2190.667
## 5:     P05              L    11.713289   106.00528  7.858005   2168.400
##    Deviation_R   stdev_R Duration_R
## 1:    45.97108  8.626094    1607.75
## 2:    50.60325 10.235618    1536.95
## 3:    79.77566  9.758520    1666.05
## 4:    35.97138  8.848672    2752.30
## 5:   117.71856 10.834453    1934.85



Figure: Deviation values

This plot shows average deviation values (i.e., average distance of estimated POR from center of validation stimulus) by eye for each subject. Smaller deviation values indicate a better calibration/more accurate tracking.



Figure: Deviation values, by validation stimulus

This plot shows deviation values for each validation stimulus. This allows you to see at a glance if high deviations are due to a generally poor calibration (consistently high values across stimuli) or outlier value(s). It also allows you to see if any participants are missing data for one or more validation stimuli (missing data will result in a displayed warning before the plot).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).



Figure: Deviation values, by validation trial

This plot shows deviation values for each validation trial in the order in which they were presented. If validation stimuli are presented more than once (e.g. at beginning, middle, and end of experiment) this allows you to see at a glance if deviations are consistent or change (drift) over time.

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).




Distance from screen

Participants’ average distance from the screen (using eye position Z value) is calculated for each eye during samples in which non-zero gaze data was collected (exluding non-experimental stimuli). This is probably not interesting/useful for deterimining which eye to use for analyses, but you might want to use the distance measure from the chosen eye when calculating a group average.



Table: Distance from screen (during experimental trials)

Distance is in mm

##    Subject ScreenDistance_R ScreenDistance_L
## 1:     P01         644.6132         647.5126
## 2:     P02         632.3205         627.6549
## 3:     P03         605.0091         597.7266
## 4:     P04         591.3879         583.8653
## 5:     P05         542.5811         532.8221



Figure 3: Distance from screen

This plot shows each subject’s average distance from the screen across all experimental trials.




Eye recommendation by subject

The toolbox now uses the decision parameters outlined in the Overview section to recommend which eye provides ‘better’ data for each participant. The decision tree can be adjusted based on the priorities of the experiment (e.g., if tracking accuracy is less important than data quantity).

A flag of UNDETERMINED: missing data indicates that there were no fixations within the specified AOIs for the stimuli listed in ValidationStim, so the toolbox cannot make a recommendation based on tracking accuracy information. A flag of UNDETERMINED: conflict indicates that one eye has a better tracking ratio and the other eye has better accuracy, so the user should examine the data and make a determination of which eye to use.

Figure: Eye recommendations



Table: Summary metrics/statistics table, with eye recommendations

This table is written to a .csv file and saved in the data directory.

Variable definitions:

  • EyeRec: This is the eye that has been determined to provide higher quality data and is recommended to be used for all subsequent analyses
  • Match_TrackAccuracy: This variable indicates if both tracking ratio and deviation metrics were better for the same eye (1) or if one eye had a better tracking ratio and the other eye had a lower validation deviation (0)
  • BetterTracking: Eye with higher tracking ratio
  • TrackingDiffAboveThresh: Indicates if difference in tracking ratio between eyes exceeds threshold value
  • BetterAccuracy: Eye with lower deviation value
  • AccuracyDiffAboveThresh: Indicates if difference in deviation value between eyes exceeds threshold value
  • TrackingDiff, TrackingRatio_L/R: Tracking ratio values for each eye and difference between values
  • TotalSamples: Total number of eye tracking samples (experimental stimuli only) in data file (denominator for tracking ratio)
  • NSamples_L/R: Number of samples with non-zero gaze data (experimental stimuli only) for each eye (numerator for tracking ratio)
  • AccuracyDiff, stdev_L/R, Duration_L/R, Deviation_L/R: Difference between L/R deviation values, SD for deviation values (across stimuli), average duration of fixations used to calculate deviations
  • ScreenDistance_L/R: Average distance from screen [mm] for each eye (during experimental stimuli)
##    Subject EyeRec Match_TrackAccuracy BetterTracking
## 1:     P01   Left                   1              L
## 2:     P02   Left                   1              L
## 3:     P03  Right                   0              R
## 4:     P04   Left                   1              L
## 5:     P05   Left                   1              L
##    TrackingDiffAboveThresh BetterAccuracy AccuracyDiffAboveThresh
## 1:                       0              L                       0
## 2:                       1              L                       0
## 3:                       1              L                       0
## 4:                       0              L                       0
## 5:                       0              L                       0
##    TrackingDiff TrackingRatio_L TrackingRatio_R TotalSamples NSamples_L
## 1:            0             100             100         2322       2322
## 2:           45              99              54         2500       2480
## 3:           25              63              88         2653       1659
## 4:            2              52              50         5604       2912
## 5:            0              90              90         3156       2846
##    NSamples_R AccuracyDiff Deviation_L   stdev_L Duration_L Deviation_R
## 1:       2313    20.944346    25.02673 12.873421   1716.050    45.97108
## 2:       1338     6.874652    43.72860  5.828800   1620.275    50.60325
## 3:       2341     7.902798    71.87286 12.531135   1591.075    79.77566
## 4:       2828    12.543479    23.42790  5.403864   2190.667    35.97138
## 5:       2842    11.713289   106.00528  7.858005   2168.400   117.71856
##      stdev_R Duration_R ScreenDistance_R ScreenDistance_L
## 1:  8.626094    1607.75         644.6132         647.5126
## 2: 10.235618    1536.95         632.3205         627.6549
## 3:  9.758520    1666.05         605.0091         597.7266
## 4:  8.848672    2752.30         591.3879         583.8653
## 5: 10.834453    1934.85         542.5811         532.8221




Appendix: Distributions of L/R differences relative to thresholds

This is meant to provide a visualization of the distribution of differences between eyes for quantity and accuracy metrics to help the user select threshold values for what constitutes a meaningful difference (and thus should be taken into account when choosing which eye’s data to use for analyses).



Histogram: Tracking ratio L/R difference

Current threshold for a meaningful tracking ratio difference is 5%. Currently, 2 of 5 participants have a tracking ratio difference above this threshold.



Histogram: Deviation value (accuracy) L/R difference

Current threshold for a meaningful deviation value difference is 25 pixels. Currently, 0 of 5 participants have a deviation value difference above this threshold.




Supplementary Feature: Precision

This metric was inspired by the toolbox created by Sam Wass and colleagues (Wass et al., 2014).

Currently, precision is included in the toolbox for visualization purposes only. This information is not integrated into the eye recommendation made above.

NOTE: The more data you have, the longer this chunk takes to run. To exclude this section, set prec_on parameter to FALSE

Precision is meant to be an indicator of how stable/jittery the estimated POR is over time. Perfectly precise tracking would be characterized by sustained periods of very small sample-to-sample changes in X/Y coordinates (fixations) separated by short periods of large changes (saccades) or missing data (blinks or looks away from the screen). Poor precision is charaterized by larger sample-to-samples changes in X/Y coordinates, even within fixations. In general, better precision is correlated with more accurate estimates of POR. The precision metric is an index of moment-to-moment stability in the location of the estimated POR (‘jitter’), and does not take stability of the track itself (‘flicker’) or missing data into consideration.

To calculate precision, looking data is broken into short time windows (window size defined in code, default is 100ms), calculates the median X/Y gaze coordinates within each window for each eye, and then computes a difference score for the X/Y coordinates at each time point relative to that window’s median value. Windows with missing data for >= half are dropped from the analysis. Precision scores for each variable (RightX, RightY, LeftX, LeftY) are calculated by finding the median difference score for each coordinate across all experimental stimuli, and then overall precision scores are calculated for each eye by averaging X and Y precision scores. Lower precision scores are better (i.e., more stable gaze)

\[\begin{aligned} MedianX_{window} &= median(PORX_{sample1},PORX_{sample2},...,PORX_{sampleN}) \\ MedianY_{window} &= median(PORY_{sample1},PORY_{sample2},...,PORY_{sampleN}) \\ \\ DifferenceX_{sample} &= \left\lvert PORX_{sample} - MedianX_{window} \right\rvert \\ DifferenceY_{sample} &= \left\lvert PORY_{sample} - MedianY_{window} \right\rvert \\ \\ JitterX &= median(DifferenceX_{sample}) \\ JitterY &= median(DifferenceY_{sample}) \\ \\ Precision &= median(\frac{JitterX + JitterY}{2}) \end{aligned}\]



Table: Precision

  • BetterPrecision: Which eye has better (lower) precision value across all stimuli
  • RPOR_precision: Overall precision value for right eye (average of X and Y precision scores)
  • LPOR_precision: Overall precision value for left eye (average of X and Y precision scores)
  • RPORX_diff_median (etc.): Precision score for right eye X-coordinates (median difference between RPOR_X and window median)
##    Subject BetterPrecision RPOR_precision LPOR_precision RPORX_diff_median
## 1:     P01               L         1.6000          1.450              1.35
## 2:     P02               L         1.8375          1.725              1.60
## 3:     P03               L         1.5250          1.500              1.35
## 4:     P04               L         0.6500          0.600              0.60
## 5:     P05               L         1.0250          0.700              1.10
##    RPORY_diff_median LPORX_diff_median LPORY_diff_median PrecisionDiff
## 1:             1.850               1.3              1.60        0.1500
## 2:             2.075               1.6              1.85        0.1125
## 3:             1.700               1.3              1.70        0.0250
## 4:             0.700               0.6              0.60        0.0500
## 5:             0.950               0.7              0.70        0.3250



Figure: Precision (example window)

Preciscion values for each eye at each time point are calculated by averaging X and Y difference values (differences between POR and smoothed gaze). Values close to zero indicate stable tracking (little to no difference between smoothed and unsmoothed values). Occasional spikes indicate large sample-to-sample changes (e.g., saccades) and will not affect overall precion values (which are based on medians, not averages).

Because experiments often have thousands of samples, it is generally not useful to plot the entire experiment at once. By default, this figure plots a 500-sample window starting at the sample number defined by xwindow_min. To adjust the starting point of the plotted window, edit xwindow_min and to adjust the length of the plotted window, adjust xwindow_max.



Figure: Precision, overall medians

Plot of overall median precision values, by eye. Precision values close to zero indicate good overall precision, while higher values indicate less precision (more noise/jitter). Median precision values are derived from the difference between raw POR and smoothed window medians at each sample (see example window figure).



Figure: Precision (Left eye)

This plot shows LEFT EYE raw gaze (POR) X and Y coordinates and smoothed window median coordinates for an example time window. Differences between POR and window medians are plotted near the x-axis.

Figure: Precision (Right eye)

This plot shows RIGHT EYE raw gaze (POR) X and Y coordinates and smoothed window median coordinates for an example time window. Differences between POR and window medians are plotted near the x-axis.




Contact

Contact Heidi Baumgartner (heidibaum@gmail.com) with questions or with feature requests.

Special thanks to Kristen Tummeltshammer, Andrew Lynn, and other members of the DCN Lab at Brown University for help with testing and feature suggestions.